Computing Iceberg Queries Eeciently Paper Number 234
نویسندگان
چکیده
Many applications compute aggregate functions (such as COUNT, SUM) over an attribute (or set of attributes) to nd aggregate values above some speci ed threshold. We call such queries iceberg queries because the number of above-threshold results is often very small (the tip of an iceberg), relative to the large amount of input data (the iceberg). Such iceberg queries are common in many applications, including data warehousing, information-retrieval, market basket analysis in data mining, clustering and copy detection. We propose e cient algorithms to evaluate iceberg queries using very little memory and signi cantly fewer passes over data, as compared to current techniques that use sorting or hashing. We present an experimental case study using over three gigabytes of Web data to illustrate the savings obtained by our algorithms.
منابع مشابه
Computing Iceberg Queries Eeciently
Many applications compute aggregate functions over an attribute (or set of attributes) to nd aggregate values above some speci ed threshold. We call such queries iceberg queries, because the number of abovethreshold results is often very small (the tip of an iceberg), relative to the large amount of input data (the iceberg). Such iceberg queries are common in many applications, including data w...
متن کاملComputing Iceberg Queries Efficiently
Many applications compute aggregate functions over an attribute (or set of attributes) to find aggregate values above some specified threshold. We call such queries iceberg queries, because the number of abovethreshold results is often very small (the tip of an iceberg), relative to the large amount of input data (the iceberg). Such iceberg queries are common in many applications, including dat...
متن کاملPartitioning based algorithms for approximate and exact Iceberg Queries
In many applications it is necessary to identify items which occur frequently within the data set which may be a materialized or non materialized relation Such queries were recently denoted as iceberg queries Several algorithms for computing iceberg queries were presented including an approximation algorithm based on concise sampling and an exact algorithm based on sampling combined with multip...
متن کاملEfficient Computing of Iceberg Queries Using Quantiling
Iceberg queries have been recently identified as important queries for many applications. These queries can be characterized by their huge input-small output. The iceberg refers to the input, and the tip of it refers to the output. We present an efficient algorithm for computing an important class of iceberg queries. This algorithm uses a focusing technique for the query result using quantiling...
متن کاملMethods for Evaluating Iceberg Queries
Iceberg queries are a special case of SQL queries involving GROUP BY and HAVING clauses, wherein the answer set is small relative to the database size. Iceberg queries have been recently identified as important queries for many applications. Queries can be characterized by their huge input-small output. The iceberg refers to the input, and the tip of it refers to the output. This paper is going...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998